The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
Background
About the dataset: The table named Spotify Million Song Dataset has 57,650 rows and 4 columns, with column names ‘artist’, ‘song’, ‘link’, and ‘text’, all of which are of string type. It provides a comprehensive collection of data that can be analyzed to gain insights into various aspects of the songs in the Spotify library.
Questions
The following are 4 questions that will be explored in this report and the general approach taken to answer/explore them.
What are the most common themes or topics discussed in song lyrics?
Analyze the ‘text’ column to identify recurring words, phrases, or themes in song lyrics. Use techniques like word frequency analysis, topic modeling, or sentiment analysis to uncover prevalent themes or emotions.
Which artists have the largest vocabulary in their song lyrics?
Calculate the diversity of vocabulary for each artist by analyzing the unique words used in their song lyrics. Identify artists who use a wide range of vocabulary versus those who stick to a more limited set of words.
Are there any trends or patterns in the sentiment of song lyrics over time or across genres?
Perform sentiment analysis on the lyrics to determine the overall sentiment (positive, negative, neutral) of songs. Explore if there are shifts in sentiment over different time periods or if certain genres tend to exhibit particular emotional tones.
Which artists tend to use the most profanity?
To gauge profanity in song lyrics, a profanity list is compiled. Lyrics are broken into words or phrases, identifying profane instances through comparison. Tallying these instances, counts are normalized for fair assessment. Trends across artists, genres, or time periods are then analyzed, considering profanity’s subjective nature and cultural context. This approach unveils differences in profanity usage among artists.
Importing Data
# Read from csv into objectdf_songs <-read.csv("Spotify Million Song Dataset_exported.csv")
Cleaning Data
We will remove the links associated with each of the songs as they are not important to the questions that we are trying to answer in this report. Additionally, we are creating a new identifier for each of the songs that corresponds to the song number in df_songs
# Remove link variabledf_songs$link <-NULL# add a song number (identifier) so that songs can be tracked between objects# Add a new column with row numbers in the desired formatdf_songs <- df_songs %>%mutate(song_num =paste0("song_", row_number()))
General Data Wrangling
The provided R code forms part of a text processing pipeline designed to analyze song lyrics. It initializes a list named list_songs to store processed word objects, likely representing individual songs. The code then iterates over each row of a DataFrame named df_songs, assumed to contain song data. Within each iteration, the lyrics from the third column of the DataFrame are read and processed. Initially converted to character type, the lyrics are subsequently tokenized into words using the unnest_tokens function. Stopwords, common non-informative words like “the” and “and”, are then removed from the tokens using an anti-join operation with a stopword list. The resulting processed word objects are stored in the list_songs list, indexed with names based on the iteration index. Finally, the entire list, containing processed data for each song, is saved as an RDS file named “list_songs”. This code snippet demonstrates a systematic approach to preprocess song lyrics for further analysis or exploration.
# Initialize a list to store unnested word objects#list_songs <- list()# Iterate over each row of the dataframe#for (i in 1:nrow(df_songs)) {# Read the text from the 3rd column#songs_text <- df_songs[i, 3]# Convert to character#songs_text <- as.character(songs_text)# Create a dataframe with the text#df_song <- data.frame(text = songs_text)# Unnest tokens#post_unnested <- unnest_tokens(df_song, word, text)# Antijoin with Stopwords for all objects in list#post_unnested_filtered <- anti_join(post_unnested, stop_words, by = "word")# Store the unnested word object in the list#list_songs[[paste0("song_", i)]] <- post_unnested_filtered#}#saveRDS(list_songs, "list_songs")list_songs <-readRDS("list_songs")
Question 1: What are the most common themes or topics discussed in song lyrics?
For this, we can perform a word frequency analysis to identify recurring words or phrases in the song lyrics.
# Combine all words from all songs into a single dataframeall_words <-bind_rows(list_songs)# Perform word frequency analysisword_freq <- all_words %>%count(word, sort =TRUE)# Visualize the top 20 most common wordstop_words <- word_freq %>%slice_max(n =20, order_by = n)# Plotting the top wordsggplot(top_words, aes(x =reorder(word, n), y = n)) +geom_col(fill ="skyblue") +labs(title ="Top 20 Most Common Words in Song Lyrics",x ="Word",y ="Frequency") +theme(axis.text.x =element_text(angle =45, hjust =1))
Question 2: Which artists have the largest vocabulary in their song lyrics?
We can calculate the diversity of vocabulary for each artist by analyzing the unique words used in their song lyrics.
#artist_dfs <- list()# Get unique artists from df_songs#unique_artists <- unique(df_songs$artist)# Loop through each artist#for (artist in unique_artists) {# Filter rows where artist is the current artist#artist_songs <- df_songs[df_songs$artist == artist, ]# Combine text for the current artist#artist_text <- paste(artist_songs$text, collapse = ' ')# Create DataFrame for the current artist#artist_df <- data.frame(artist = artist, text_combined = artist_text)# Append the dataframe to the list#artist_dfs[[artist]] <- artist_df#}# Combine all artist dataframes into a single dataframe#all_artists_df <- do.call(rbind, artist_dfs)# Print the combined dataframe#print(all_artists_df)#saveRDS(all_artists_df, file = "all_artists_df")all_artists_df <-readRDS("all_artists_df")
# Initialize an empty list to store tokenized words for each artist#tokenized_words_list <- list()# Loop through each artist in the dataframe#for (artist in unique(all_artists_df$artist)) {# Filter dataframe for the current artist#artist_df <- all_artists_df[all_artists_df$artist == artist, ]# Tokenize the combined lyrics for the current artist#tokenized_words <- unlist(tokenize_words(artist_df$text_combined))# Add tokenized words to the list#tokenized_words_list[[artist]] <- tokenized_words#}#saveRDS(tokenized_words_list, file = "tokenized_words_list")tokenized_words_list <-readRDS("tokenized_words_list")
# Initialize an empty dataframe to store vocabulary sizesvocabulary_df <-data.frame(artist =character(), vocabulary_size =numeric())# Loop through each artist in the tokenized words listfor (artist innames(tokenized_words_list)) {# Get tokenized words for the current artist tokenized_words <- tokenized_words_list[[artist]]# Calculate vocabulary size (number of unique words) vocabulary_size <-length(unique(tokenized_words))# Add artist and vocabulary size to the dataframe vocabulary_df <-rbind(vocabulary_df, data.frame(artist = artist, vocabulary_size = vocabulary_size))}# Sort the dataframe by vocabulary sizevocabulary_df <- vocabulary_df[order(vocabulary_df$vocabulary_size, decreasing =TRUE), ]# Print the dataframe to see which artists have the biggest and smallest vocabulariesprint(vocabulary_df)
artist vocabulary_size
303 LL Cool J 6677
224 Insane Clown Posse 6362
297 Lil Wayne 6316
139 Fabolous 6081
582 Wu-Tang Clan 5603
120 Eminem 5262
252 Joni Mitchell 5230
262 Kanye West 5043
210 Ice Cube 5015
41 Bob Dylan 4921
243 Jimmy Buffett 4821
565 Weird Al Yankovic 4721
186 Gucci Mane 4702
108 Drake 4646
317 Marillion 4616
354 Nick Cave 4554
387 Outkast 4514
448 Red Hot Chili Peppers 4387
221 Indigo Girls 4265
406 Phish 4241
357 Nicki Minaj 4235
473 Snoop Dogg 4234
462 Rush 4231
51 Bruce Springsteen 4188
596 Xzibit 4161
31 Beautiful South 4137
307 Lou Reed 4105
273 Kid Rock 4040
24 Arrogant Worms 3982
419 Puff Daddy 3977
118 Elvis Costello 3962
364 NOFX 3952
456 Robbie Williams 3940
396 Paul Simon 3922
249 John Prine 3921
476 Squeeze 3907
366 Notorious B.I.G. 3891
251 Johnny Cash 3869
234 J Cole 3820
73 Clash 3785
86 David Bowie 3765
278 Kinks 3753
621 Z-Ro 3748
11 Alice Cooper 3744
436 R. Kelly 3739
327 Megadeth 3730
135 Everlast 3697
595 XTC 3692
212 Iggy Pop 3688
471 Slayer 3672
257 Judas Priest 3663
512 Tragically Hip 3622
205 Horrible Histories 3611
228 Iron Maiden 3608
65 Chris Brown 3604
117 Elton John 3601
180 Gordon Lightfoot 3575
329 Metallica 3573
3 Adam Sandler 3560
457 Rod Stewart 3548
5 Aerosmith 3532
505 Tom T. Hall 3524
192 Hank Williams Jr. 3518
37 Billy Joel 3503
573 Who 3493
182 Grateful Dead 3438
270 Kenny Chesney 3433
411 Pogues 3431
155 Frank Zappa 3428
237 James Taylor 3428
100 Dolly Parton 3355
399 Pet Shop Boys 3350
198 Helloween 3346
34 Bette Midler 3335
244 John Denver 3330
218 Incubus 3324
615 Young Jeezy 3312
519 UB40 3307
425 Queen 3296
248 John Mellencamp 3294
394 Patti Smith 3286
185 Green Day 3277
78 Counting Crows 3276
458 Rolling Stones 3276
85 David Allan Coe 3260
397 Pearl Jam 3258
264 Kate Bush 3255
318 Marilyn Manson 3250
295 Leonard Cohen 3237
109 Dream Theater 3215
347 Nazareth 3211
8 Alabama 3199
299 Linkin Park 3175
167 Genesis 3169
315 Mariah Carey 3161
121 Emmylou Harris 3155
388 Overkill 3152
175 Glee 3151
91 Deep Purple 3146
340 Morrissey 3146
44 Bob Seger 3126
284 Kris Kristofferson 3115
45 Bon Jovi 3113
69 Christmas Songs 3111
287 Lady Gaga 3109
239 Jason Mraz 3108
169 George Harrison 3094
70 Christy Moore 3092
28 Barbra Streisand 3084
190 Hank Snow 3070
351 Neil Young 3062
401 Peter Gabriel 3055
55 Carly Simon 3052
348 Ne-Yo 3049
362 Nitty Gritty Dirt Band 3019
235 Jackson Browne 3018
332 Michael Jackson 3016
38 Bing Crosby 2996
443 Randy Travis 2992
82 Cyndi Lauper 2985
390 Ozzy Osbourne 2982
501 Tim McGraw 2980
508 Tori Amos 2978
455 Rihanna 2974
151 Flo-Rida 2966
259 Judy Garland 2962
439 Rage Against The Machine 2940
415 Primus 2935
373 Oingo Boingo 2926
408 Pitbull 2923
74 Cliff Richard 2922
29 Beach Boys 2910
283 Korn 2908
483 Sting 2906
536 Usher 2905
67 Christina Aguilera 2903
23 Arlo Guthrie 2882
482 Stevie Wonder 2872
407 Pink Floyd 2856
559 Warren Zevon 2854
260 Justin Bieber 2850
410 P!nk 2848
518 U2 2844
184 Great Big Sea 2837
416 Prince 2834
363 Noa 2832
104 Donna Summer 2823
196 Harry Connick, Jr. 2817
132 Eurythmics 2812
583 Wyclef Jean 2811
62 Cher 2810
191 Hank Williams 2805
486 Styx 2789
84 Dave Matthews Band 2786
540 Van Morrison 2771
316 Marianne Faithfull 2770
16 America 2768
172 George Strait 2767
335 Miley Cyrus 2762
321 Mary Black 2742
71 Chuck Berry 2738
15 Alphaville 2731
272 Kenny Rogers 2731
333 Michael W. Smith 2725
115 Ella Fitzgerald 2719
63 Chicago 2709
447 Reba Mcentire 2705
39 Black Sabbath 2694
546 Venom 2684
40 Blur 2682
92 Def Leppard 2679
94 Depeche Mode 2677
442 Ramones 2674
479 Steely Dan 2668
344 Natalie Cole 2658
603 Yes 2658
446 Ray Charles 2656
59 Celine Dion 2655
154 Frank Sinatra 2654
359 Nina Simone 2652
506 Tom Waits 2637
437 Radiohead 2616
60 Chaka Khan 2613
543 Vanilla Ice 2607
95 Devo 2602
203 Hollies 2602
389 Owl City 2601
638 ZZ Top 2591
308 Louis Armstrong 2587
371 Offspring 2579
427 Queen Latifah 2565
288 Lana Del Rey 2553
575 Will Smith 2553
292 Leann Rimes 2551
119 Elvis Presley 2550
17 Amy Grant 2549
145 Fall Out Boy 2548
574 Widespread Panic 2534
176 Glen Campbell 2532
306 Loretta Lynn 2532
330 Michael Bolton 2525
240 Jennifer Lopez 2521
498 Thin Lizzy 2521
280 Kirsty Maccoll 2520
89 Dean Martin 2518
510 Townes Van Zandt 2516
562 Waylon Jennings 2516
265 Katy Perry 2512
290 Lauryn Hill 2510
50 Britney Spears 2508
97 Diana Ross 2503
188 Guns N' Roses 2503
312 Lynyrd Skynyrd 2493
576 Willie Nelson 2487
236 The Jam 2485
418 Procol Harum 2482
171 George Michael 2478
242 Jimi Hendrix 2477
114 Electric Light Orchestra 2473
349 Neil Diamond 2472
395 Paul McCartney 2470
47 Bonnie Raitt 2469
271 Kenny Loggins 2466
286 Kylie Minogue 2465
414 Pretenders 2463
268 Kelly Clarkson 2454
293 Lenny Kravitz 2445
209 Ian Hunter 2440
170 George Jones 2438
281 Kiss 2434
376 Olivia Newton-John 2434
313 Madonna 2426
207 Howard Jones 2424
534 Uriah Heep 2424
152 Foo Fighters 2423
627 Zebrahead 2423
83 Dan Fogelberg 2421
326 Meat Loaf 2399
164 Garth Brooks 2398
537 Utada Hikaru 2394
102 Don McLean 2383
127 Erasure 2380
66 Chris Rea 2364
358 Nightwish 2363
398 Perry Como 2349
336 Misfits 2345
338 The Monkees 2345
509 Toto 2345
106 Doors 2342
77 Conway Twitty 2341
339 Moody Blues 2339
247 John McDermott 2338
429 Queensryche 2338
1 ABBA 2334
551 Vince Gill 2334
30 The Beatles 2333
444 Rascal Flatts 2331
187 Guided By Voices 2328
539 Van Halen 2328
560 W.A.S.P. 2328
563 Ween 2324
110 Dusty Springfield 2323
134 Everclear 2320
581 Wiz Khalifa 2320
503 Tom Jones 2316
25 Avril Lavigne 2314
26 Backstreet Boys 2313
413 Poison 2292
61 Cheap Trick 2289
13 Alison Krauss 2281
193 Hanson 2275
173 Gino Vannelli 2271
343 Nat King Cole 2265
403 Pharrell Williams 2251
298 Linda Ronstadt 2249
267 Keith Urban 2244
466 Scorpions 2243
350 Neil Sedaka 2237
405 Phineas And Ferb 2236
520 Ufo 2236
478 Status Quo 2233
142 Faith Hill 2231
412 Point Of Grace 2231
33 Bee Gees 2227
490 Talking Heads 2220
226 INXS 2218
538 Utopia 2216
197 Heart 2213
452 Reo Speedwagon 2213
277 King Diamond 2206
453 Richard Marx 2196
369 Oasis 2195
36 Billie Holiday 2194
150 Fleetwood Mac 2191
165 Gary Numan 2191
459 Roxette 2191
245 John Legend 2187
255 Journey 2185
605 Ying Yang Twins 2185
311 Luther Vandross 2183
9 Alan Parsons Project 2177
46 Boney M. 2177
555 Vybz Kartel 2176
561 Waterboys 2175
461 Roy Orbison 2167
98 Dire Straits 2163
493 The Temptations 2149
6 Air Supply 2147
564 Weezer 2140
356 Nickelback 2139
449 Regine Velasquez 2139
19 Andy Williams 2136
274 The Killers 2135
43 Bob Rivers 2128
254 Josh Groban 2124
502 Tina Turner 2122
12 Alice In Chains 2121
511 Tracy Chapman 2113
557 Wanda Jackson 2113
392 Pat Benatar 2107
105 Doobie Brothers 2100
177 Gloria Estefan 2091
487 Sublime 2085
320 Maroon 5 2082
261 Justin Timberlake 2080
143 Faith No More 2057
269 Kelly Family 2054
58 Cat Stevens 2053
314 Manowar 2053
440 Rainbow 2046
57 Carpenters 2042
361 Nirvana 2042
64 Children 2039
80 Crowded House 2034
279 Kirk Franklin 2031
14 Allman Brothers Band 2028
491 Taylor Swift 2021
513 Train 2018
250 John Waite 2007
331 Michael Buble 2007
113 Eddie Cochran 1998
93 Demi Lovato 1993
304 Lloyd Cole 1975
374 Old 97's 1974
294 Leo Sayer 1973
566 Westlife 1962
445 Ray Boltz 1959
381 Opeth 1957
229 Irving Berlin 1952
417 Proclaimers 1941
450 Religious Music 1939
460 Roxy Music 1924
579 Wishbone Ash 1923
480 Steve Miller Band 1916
208 Human League 1906
128 Eric Clapton 1901
54 Cake 1892
504 Tom Lehrer 1892
365 Norah Jones 1886
492 Tears For Fears 1885
112 Ed Sheeran 1883
420 Q-Tip 1880
606 Yngwie Malmsteen 1877
195 Harry Belafonte 1867
368 O.A.R. 1865
552 Violent Femmes 1862
360 Nine Inch Nails 1860
370 Ocean Colour Scene 1856
607 Yo Gotti 1853
553 Virgin Steele 1850
385 Our Lady Peace 1842
572 Whitney Houston 1841
138 Extreme 1837
464 Santana 1837
75 Coldplay 1835
291 Lea Salonga 1834
107 Doris Day 1832
241 Jim Croce 1819
296 Les Miserables 1818
276 King Crimson 1812
523 Ultravox 1805
489 System Of A Down 1803
488 Supertramp 1799
609 Yoko Ono 1794
612 You Am I 1791
52 Bruno Mars 1787
149 Fiona Apple 1787
352 New Order 1786
616 Youngbloodz 1784
300 Lionel Richie 1774
631 Ziggy Marley 1774
404 Phil Collins 1773
166 Gary Valenciano 1771
202 HIM 1766
238 Janis Joplin 1764
469 Sia 1762
122 Engelbert Humperdinck 1760
162 Freestyle 1745
379 One Direction 1745
522 Ultramagnetic Mc's 1742
116 Ellie Goulding 1741
101 Don Henley 1732
342 'n Sync 1714
514 Travis 1709
310 Lucky Dube 1703
524 Uncle Kracker 1700
599 Yelawolf 1698
472 Smiths 1694
42 Bob Marley 1691
124 Enrique Iglesias 1679
428 Queens Of The Stone Age 1679
474 Soundgarden 1675
201 Hillsong United 1668
613 Young Buck 1664
153 Foreigner 1661
181 Grand Funk Railroad 1653
266 Keith Green 1653
258 Judds 1647
500 Tim Buckley 1639
2 Ace Of Base 1634
570 The White Stripes 1633
571 Whitesnake 1633
90 Death 1621
372 Ofra Haza 1610
378 Omd 1609
230 Isley Brothers 1605
384 Otis Redding 1599
130 Etta James 1585
81 Culture Club 1584
337 Modern Talking 1579
535 Used 1579
485 Stone Temple Pilots 1557
140 Face To Face 1546
200 Hillsong 1538
432 Quiet Riot 1526
275 Kim Wilde 1521
600 Yello 1520
355 Nick Drake 1500
507 Tool 1497
346 Natalie Imbruglia 1486
49 Bread 1470
345 Natalie Grant 1463
391 Passenger 1455
580 Within Temptation 1455
146 Fastball 1454
211 Idina Menzel 1452
441 Rammstein 1448
526 Underoath 1448
147 Fatboy Slim 1437
527 Underworld 1426
220 Indiana Bible College 1424
541 Vanessa Williams 1422
21 Ariana Grande 1419
515 Twenty One Pilots 1418
601 Yellowcard 1417
20 Annie 1416
256 Joy Division 1416
131 Europe 1413
402 Peter Tosh 1411
554 Vonda Shepard 1409
325 Mc Hammer 1403
548 Vertical Horizon 1394
544 Velvet Underground 1391
377 Olly Murs 1390
323 Matt Redman 1388
334 Migos 1386
400 Peter Cetera 1383
451 Rem 1367
386 Out Of Eden 1360
217 Incognito 1352
246 John Martyn 1336
499 Tiffany 1330
133 Evanescence 1328
4 Adele 1316
475 Spandau Ballet 1316
48 Bosson 1314
496 The Script 1309
618 Yukmouth 1303
87 David Guetta 1302
141 Faces 1302
178 Gloria Gaynor 1302
53 Bryan White 1295
610 Yolanda Adams 1295
285 Kyla 1291
72 Cinderella 1282
567 Wet Wet Wet 1282
222 Ingrid Michaelson 1279
375 Oliver 1276
393 Patsy Cline 1262
569 Whiskeytown 1262
528 Unearth 1261
204 Hooverphonic 1257
525 Uncle Tupelo 1241
328 Men At Work 1238
18 Andrea Bocelli 1226
157 Frankie Laine 1223
619 Yung Joc 1222
549 Veruca Salt 1219
183 Grease 1213
521 Ugly Kid Joe 1209
484 Stone Roses 1208
608 Yo La Tengo 1207
111 Eagles 1200
622 Zac Brown Band 1192
533 Unwritten Law 1191
633 Zoegirl 1186
532 Unseen 1183
467 Selah 1181
468 Selena Gomez 1180
289 Lata Mangeshkar 1177
465 Savage Garden 1164
309 Louis Jordan 1162
301 Little Mix 1139
79 Creedence Clearwater Revival 1124
168 George Formby 1124
214 Imagine Dragons 1121
129 Erik Santos 1112
383 Oscar Hammerstein 1105
604 YG 1100
598 Yeah Yeah Yeahs 1095
586 X-Raided 1092
156 Frankie Goes To Hollywood 1082
454 Rick Astley 1080
426 Queen Adreena 1072
550 Verve 1071
614 Young Dro 1064
636 Zucchero 1058
433 Quietdrive 1055
594 Xscape 1042
409 Planetshakers 1035
10 Aled Jones 1030
382 Orphaned Land 1022
497 The Weeknd 1022
380 OneRepublic 1021
624 Zao 1020
158 Frankie Valli 1018
477 Starship 1018
591 Xavier Rudd 1017
161 Free 1003
123 Enigma 992
99 Divine 990
194 Happy Mondays 987
558 Wang Chung 986
163 Fun. 983
435 Quincy Punx 981
322 Matt Monro 979
7 Aiza Seguerra 977
206 Housemartins 972
495 The Broadways 963
216 Imperials 957
422 Quarashi 957
144 Falco 956
35 Bill Withers 944
125 Enya 940
22 Ariel Rivera 936
68 Christina Perri 931
324 Mazzy Star 914
494 Ten Years After 911
96 Dewa 19 904
530 Unkle 892
568 Wham! 887
233 Iwan Fals 877
438 Raffi 869
577 Wilson Phillips 868
213 Il Divo 865
319 Mark Ronson 864
593 Xiu Xiu 854
424 Quasi 842
76 Cole Porter 828
589 Xandria 827
189 Halloween 800
27 Barbie 798
587 X-Ray Spex 798
635 Zox 795
341 Mud 794
423 Quarterflash 790
434 Quincy Jones 790
56 Carol Banawa 789
32 Beauty And The Beast 788
629 Zero 7 778
199 High School Musical 777
584 X 777
103 Don Moen 775
597 Yazoo 773
225 Inside Out 771
253 Jose Mari Chan 771
305 Lorde 767
263 Kari Jobe 759
630 Zeromancer 734
481 Stevie Ray Vaughan 725
578 Wilson Pickett 720
623 Zakk Wylde 718
160 Freddie King 703
617 Youth Of Today 703
215 Imago 690
223 Inna 689
148 Fifth Harmony 686
585 X Japan 680
542 Vangelis 666
232 Israel Houghton 665
430 Quicksand 664
626 Zebra 660
641 Van Der Graaf Generator 658
219 Independence Day 653
159 Freddie Aguilar 644
463 Sam Smith 636
88 David Pomeranz 635
625 Zayn Malik 633
516 U. D. O. 625
620 Yusuf Islam 604
431 Quicksilver Messenger Service 597
639 Joseph And The Amazing Technicolor Dreamcoat 589
637 Zwan 577
421 Qntal 567
611 Yonder Mountain String Band 546
231 Israel 539
353 Next To Normal 499
602 Yeng Constantino 493
470 Side A 464
367 O-Zone 460
545 Vengaboys 460
592 Xentrix 458
556 Walk The Moon 454
179 GMB 453
547 Vera Lynn 424
227 Iron Butterfly 422
634 Zornik 383
302 Little Walter 381
136 Exo 375
590 Xavier Naidoo 345
137 Exo-K 315
282 Koes Plus 299
531 Unknown 287
640 Soundtracks 238
642 Various Artists 207
174 Gipsy Kings 195
643 Zazie 182
126 Eppu Normaali 154
529 Ungu 86
628 Zed 77
517 U-Kiss 76
632 Zoe 56
588 X-Treme 37
This code initializes an empty dataframe called vocabulary_df to store the vocabulary sizes of different artists. Then, it iterates through each artist in the tokenized_words_list object. For each artist, it calculates the vocabulary size (number of unique words) from their tokenized words and appends the artist name along with their vocabulary size to the vocabulary_df dataframe using rbind(). After looping through all artists, the dataframe is sorted in descending order based on vocabulary size. Finally, the sorted dataframe is printed to identify which artists have the biggest and smallest vocabularies.
In the vocabulary_df dataframe, each row represents an artist, with the columns containing the artist’s name (artist) and their corresponding vocabulary size (vocabulary_size). To reference specific data in the vocabulary_df object, you can use inline code like this: vocabulary_df$column_name. For example, to reference the vocabulary size of the first artist in the dataframe, you would use vocabulary_df$vocabulary_size[1].
#Smallest and Largesttop5_largest <- vocabulary_df[1:5, ]top5_smallest <- vocabulary_df[(nrow(vocabulary_df) -4):nrow(vocabulary_df), ]# Combine the top 5 largest and smallest dataframescombined_df <-rbind(top5_largest, top5_smallest)# Add a column to indicate whether the artist has the largest or smallest vocabularycombined_df$category <-ifelse(combined_df$artist %in% top5_largest$artist, "Largest", "Smallest")# Reorder the artists within each categorycombined_df$artist <-factor(combined_df$artist, levels = combined_df$artist[order(combined_df$category, combined_df$vocabulary_size)])# Plot the combined dataggplot(combined_df, aes(x =reorder(artist, vocabulary_size), y = vocabulary_size)) +geom_bar(stat ="identity", aes(fill = category)) +facet_wrap(~ category, scales ="free", nrow =1) +labs(title ="Top 5 Artists with Largest and Smallest Vocabulary",x ="Artist",y ="Vocabulary Size",fill ="Category") +theme(axis.text.x =element_text(angle =45, hjust =1))
The interactive scatter plot visualizes the relationship between the number of songs written by each artist and the length of their combined lyrics. Each point on the plot represents an artist, with the x-axis denoting the number of songs written and the y-axis representing the length of combined text. Hovering over a data point reveals the corresponding artist’s name, providing additional context. This interactive visualization allows for easy exploration of the dataset, enabling users to identify patterns and outliers among artists in terms of their productivity and the amount of lyrical content they’ve produced. It offers a dynamic way to analyze and interpret the relationship between songwriting activity and the quantity of text generated by different artists in the dataset.
unique_words <-aggregate(text_combined ~ artist, data = all_artists_df, FUN =function(x) length(unique(unlist(strsplit(x, "\\s+")))))# Compute the frequency of each artist in df_songs$artistartist_counts <-as.data.frame(table(df_songs$artist), stringsAsFactors =FALSE)colnames(artist_counts) <-c("artist", "num_songs")# Merge artist counts with unique_words dataframeunique_words <-merge(unique_words, artist_counts, by ="artist", all.x =TRUE)# View the updated unique_words dataframeprint(unique_words)
artist text_combined num_songs
1 'n Sync 2828 93
2 ABBA 3224 113
3 Ace Of Base 2317 74
4 Adam Sandler 5135 70
5 Adele 1835 54
6 Aerosmith 5038 171
7 Air Supply 3170 174
8 Aiza Seguerra 1202 25
9 Alabama 4525 187
10 Alan Parsons Project 2886 102
11 Aled Jones 1275 23
12 Alice Cooper 5237 174
13 Alice In Chains 2957 95
14 Alison Krauss 3205 145
15 Allman Brothers Band 3186 116
16 Alphaville 3533 105
17 America 3931 184
18 Amy Grant 3753 147
19 Andrea Bocelli 1529 25
20 Andy Williams 2994 138
21 Annie 1989 32
22 Ariana Grande 2032 51
23 Ariel Rivera 1175 19
24 Arlo Guthrie 3838 113
25 Arrogant Worms 5323 89
26 Avril Lavigne 3579 143
27 Backstreet Boys 3792 164
28 Barbie 1073 18
29 Barbra Streisand 4566 157
30 Beach Boys 4204 151
31 Beautiful South 5306 149
32 Beauty And The Beast 1062 12
33 Bee Gees 3529 170
34 Bette Midler 5571 158
35 Bill Withers 1267 35
36 Billie Holiday 2954 150
37 Billy Joel 4664 141
38 Bing Crosby 4184 157
39 Black Sabbath 4066 156
40 Blur 3512 136
41 Bob Dylan 7384 188
42 Bob Marley 2806 86
43 Bob Rivers 2739 48
44 Bob Seger 4214 158
45 Bon Jovi 4549 181
46 Boney M. 3040 98
47 Bonnie Raitt 3619 149
48 Bosson 1800 52
49 Bread 2052 75
50 Britney Spears 4198 158
51 Bruce Springsteen 5731 175
52 Bruno Mars 2748 70
53 Bryan White 1713 48
54 Cake 2403 73
55 Carly Simon 4058 168
56 Carol Banawa 987 23
57 Carpenters 2872 112
58 Cat Stevens 2988 114
59 Celine Dion 3545 116
60 Chaka Khan 3873 186
61 Cheap Trick 3534 170
62 Cher 3886 187
63 Chicago 4050 160
64 Children 3233 90
65 Chris Brown 5961 145
66 Chris Rea 3214 182
67 Christina Aguilera 4553 146
68 Christina Perri 1241 36
69 Christmas Songs 4502 140
70 Christy Moore 3968 64
71 Chuck Berry 3859 127
72 Cinderella 1652 48
73 Clash 4884 118
74 Cliff Richard 4311 184
75 Coldplay 2596 120
76 Cole Porter 991 17
77 Conway Twitty 3284 162
78 Counting Crows 4715 158
79 Creedence Clearwater Revival 1597 43
80 Crowded House 2571 76
81 Culture Club 2202 76
82 Cyndi Lauper 4243 161
83 Dan Fogelberg 3120 108
84 Dave Matthews Band 4336 168
85 David Allan Coe 4390 162
86 David Bowie 5354 165
87 David Guetta 1922 63
88 David Pomeranz 798 17
89 Dean Martin 3405 186
90 Death 2080 60
91 Deep Purple 4356 179
92 Def Leppard 3858 147
93 Demi Lovato 3005 105
94 Depeche Mode 3408 167
95 Devo 3320 125
96 Dewa 19 1067 21
97 Diana Ross 3545 167
98 Dire Straits 2646 61
99 Divine 1363 33
100 Dolly Parton 4394 180
101 Don Henley 2215 46
102 Don McLean 3108 67
103 Don Moen 1022 39
104 Donna Summer 4092 191
105 Doobie Brothers 2916 110
106 Doors 3451 97
107 Doris Day 2465 74
108 Drake 7084 117
109 Dream Theater 4528 97
110 Dusty Springfield 3455 175
111 Eagles 1604 41
112 Ed Sheeran 2480 53
113 Eddie Cochran 2828 128
114 Electric Light Orchestra 3749 148
115 Ella Fitzgerald 3775 163
116 Ellie Goulding 2414 77
117 Elton John 4523 175
118 Elvis Costello 5013 146
119 Elvis Presley 3532 168
120 Eminem 7450 70
121 Emmylou Harris 4097 173
122 Engelbert Humperdinck 2574 125
123 Enigma 1283 35
124 Enrique Iglesias 2429 79
125 Enya 1318 40
126 Eppu Normaali 178 3
127 Erasure 3392 138
128 Eric Clapton 2957 152
129 Erik Santos 1414 40
130 Etta James 2393 94
131 Europe 1981 82
132 Eurythmics 3805 142
133 Evanescence 1955 77
134 Everclear 3165 123
135 Everlast 4987 78
136 Exo 417 3
137 Exo-K 332 2
138 Extreme 2504 70
139 Fabolous 8854 115
140 Face To Face 1967 93
141 Faces 1647 37
142 Faith Hill 3088 109
143 Faith No More 2794 71
144 Falco 1154 18
145 Fall Out Boy 3694 97
146 Fastball 1920 59
147 Fatboy Slim 1954 40
148 Fifth Harmony 946 17
149 Fiona Apple 2296 56
150 Fleetwood Mac 3153 180
151 Flo-Rida 4266 59
152 Foo Fighters 3425 142
153 Foreigner 2403 86
154 Frank Sinatra 3971 154
155 Frank Zappa 5081 99
156 Frankie Goes To Hollywood 1488 28
157 Frankie Laine 1682 41
158 Frankie Valli 1488 48
159 Freddie Aguilar 756 13
160 Freddie King 926 30
161 Free 1350 39
162 Freestyle 2256 46
163 Fun. 1322 18
164 Garth Brooks 3026 85
165 Gary Numan 3320 170
166 Gary Valenciano 2381 56
167 Genesis 4789 141
168 George Formby 1431 17
169 George Harrison 4114 157
170 George Jones 3275 137
171 George Michael 3933 122
172 George Strait 3849 188
173 Gino Vannelli 2707 94
174 Gipsy Kings 216 5
175 Glee 5012 164
176 Glen Campbell 3288 159
177 Gloria Estefan 2972 102
178 Gloria Gaynor 1812 49
179 GMB 521 13
180 Gordon Lightfoot 4700 189
181 Grand Funk Railroad 2692 89
182 Grateful Dead 5090 165
183 Grease 1656 31
184 Great Big Sea 3709 93
185 Green Day 4587 174
186 Gucci Mane 6936 84
187 Guided By Voices 2840 97
188 Guns N' Roses 3395 93
189 Halloween 1074 18
190 Hank Snow 4089 158
191 Hank Williams 3888 160
192 Hank Williams Jr. 4848 185
193 Hanson 3228 129
194 Happy Mondays 1220 24
195 Harry Belafonte 2553 73
196 Harry Connick, Jr. 3742 145
197 Heart 3153 122
198 Helloween 4495 162
199 High School Musical 1130 18
200 Hillsong 2377 172
201 Hillsong United 2517 164
202 HIM 2377 98
203 Hollies 3663 154
204 Hooverphonic 1527 56
205 Horrible Histories 5102 54
206 Housemartins 1163 23
207 Howard Jones 3073 102
208 Human League 2589 71
209 Ian Hunter 3353 77
210 Ice Cube 7372 80
211 Idina Menzel 1864 33
212 Iggy Pop 4981 177
213 Il Divo 1129 23
214 Imagine Dragons 1480 41
215 Imago 810 15
216 Imperials 1248 27
217 Incognito 1920 60
218 Incubus 4447 118
219 Independence Day 796 11
220 Indiana Bible College 2168 93
221 Indigo Girls 5845 184
222 Ingrid Michaelson 1741 69
223 Inna 1001 36
224 Insane Clown Posse 10406 136
225 Inside Out 1063 20
226 INXS 3013 140
227 Iron Butterfly 530 17
228 Iron Maiden 5037 156
229 Irving Berlin 2507 70
230 Isley Brothers 2561 73
231 Israel 728 28
232 Israel Houghton 928 24
233 Iwan Fals 1037 19
234 J Cole 5423 68
235 Jackson Browne 3959 139
236 James Taylor 5341 177
237 Janis Joplin 3048 106
238 Jason Mraz 4184 101
239 Jennifer Lopez 3866 110
240 Jim Croce 2236 66
241 Jimi Hendrix 3949 127
242 Jimmy Buffett 6773 164
243 John Denver 4456 168
244 John Legend 3259 93
245 John Martyn 1825 61
246 John McDermott 2845 63
247 John Mellencamp 4479 152
248 John Prine 5286 170
249 John Waite 2504 86
250 Johnny Cash 5167 183
251 Joni Mitchell 6898 170
252 Jose Mari Chan 959 19
253 Joseph And The Amazing Technicolor Dreamcoat 706 8
254 Josh Groban 2825 85
255 Journey 3472 150
256 Joy Division 1969 49
257 Judas Priest 4830 159
258 Judds 2178 71
259 Judy Garland 4278 137
260 Justin Bieber 4403 131
261 Justin Timberlake 3111 60
262 Kanye West 7442 106
263 Kari Jobe 1002 38
264 Kate Bush 5085 153
265 Katy Perry 3422 89
266 Keith Green 2467 63
267 Keith Urban 3066 110
268 Kelly Clarkson 3747 157
269 Kelly Family 2806 97
270 Kenny Chesney 4921 173
271 Kenny Loggins 3891 153
272 Kenny Rogers 3947 174
273 Kid Rock 5542 100
274 Kim Wilde 2093 70
275 King Crimson 2248 44
276 King Diamond 3822 112
277 Kinks 5661 170
278 Kirk Franklin 3375 111
279 Kirsty Maccoll 3125 108
280 Kiss 3804 183
281 Koes Plus 348 10
282 Korn 4632 166
283 Kris Kristofferson 4240 170
284 Kyla 1751 46
285 Kylie Minogue 3790 172
286 Lady Gaga 4894 137
287 Lana Del Rey 4119 113
288 Lata Mangeshkar 1432 32
289 Lauryn Hill 3505 48
290 Lea Salonga 2521 74
291 Leann Rimes 3688 158
292 Lenny Kravitz 3224 156
293 Leo Sayer 2562 85
294 Leonard Cohen 4515 116
295 Les Miserables 2741 42
296 Lil Wayne 9534 125
297 Linda Ronstadt 2875 150
298 Linkin Park 4539 125
299 Lionel Richie 2793 121
300 Little Mix 1635 35
301 Little Walter 486 13
302 LL Cool J 10104 113
303 Lloyd Cole 2543 86
304 Lorde 928 15
305 Loretta Lynn 3153 187
306 Lou Reed 5415 164
307 Louis Armstrong 3680 129
308 Louis Jordan 1515 27
309 Lucky Dube 2357 92
310 Luther Vandross 3489 137
311 Lynyrd Skynyrd 3562 144
312 Madonna 3422 88
313 Manowar 3188 88
314 Mariah Carey 4895 159
315 Marianne Faithfull 4206 160
316 Marillion 6002 149
317 Marilyn Manson 4738 166
318 Mark Ronson 1075 18
319 Maroon 5 3056 110
320 Mary Black 3480 108
321 Matt Monro 1286 41
322 Matt Redman 1981 93
323 Mazzy Star 1191 46
324 Mc Hammer 2004 18
325 Meat Loaf 3511 92
326 Megadeth 5048 133
327 Men At Work 1536 33
328 Metallica 5213 155
329 Michael Bolton 3728 167
330 Michael Buble 2799 112
331 Michael Jackson 4928 176
332 Michael W. Smith 3832 176
333 Migos 1826 15
334 Miley Cyrus 4288 147
335 Misfits 3137 122
336 Modern Talking 2650 144
337 Moody Blues 3583 174
338 Morrissey 4399 177
339 Mud 1074 26
340 Nat King Cole 3296 149
341 Natalie Cole 4269 155
342 Natalie Grant 2106 65
343 Natalie Imbruglia 2047 72
344 Nazareth 4401 184
345 Ne-Yo 5049 146
346 Neil Diamond 3507 173
347 Neil Sedaka 3191 97
348 Neil Young 4538 185
349 New Order 2434 99
350 Next To Normal 652 9
351 Nick Cave 6299 172
352 Nick Drake 1975 67
353 Nickelback 3094 92
354 Nicki Minaj 6033 88
355 Nightwish 3115 81
356 Nina Simone 3687 158
357 Nine Inch Nails 2660 108
358 Nirvana 2736 103
359 Nitty Gritty Dirt Band 4090 125
360 Noa 3742 90
361 NOFX 5011 143
362 Norah Jones 2462 105
363 Notorious B.I.G. 5648 50
364 O-Zone 621 12
365 O.A.R. 2837 78
366 Oasis 3030 149
367 Ocean Colour Scene 2319 100
368 Offspring 3422 119
369 Ofra Haza 1994 41
370 Oingo Boingo 4222 103
371 Old 97's 2504 74
372 Oliver 1751 32
373 Olivia Newton-John 3313 146
374 Olly Murs 2037 54
375 Omd 2085 78
376 One Direction 2646 98
377 OneRepublic 1408 36
378 Opeth 2543 59
379 Orphaned Land 1219 20
380 Oscar Hammerstein 1387 20
381 Otis Redding 2564 112
382 Our Lady Peace 2560 99
383 Out Of Eden 1947 36
384 Outkast 6521 84
385 Overkill 5288 135
386 Owl City 3453 77
387 Ozzy Osbourne 4248 157
388 P!nk 4566 120
389 Passenger 1806 35
390 Pat Benatar 2922 106
391 Patsy Cline 1708 88
392 Patti Smith 4456 104
393 Paul McCartney 4007 169
394 Paul Simon 5289 156
395 Pearl Jam 4983 164
396 Perry Como 3701 148
397 Pet Shop Boys 4755 164
398 Peter Cetera 1842 75
399 Peter Gabriel 4043 96
400 Peter Tosh 1883 50
401 Pharrell Williams 3033 30
402 Phil Collins 2794 114
403 Phineas And Ferb 3121 67
404 Phish 5511 163
405 Pink Floyd 4100 111
406 Pitbull 4366 72
407 Planetshakers 1528 116
408 Pogues 4193 100
409 Point Of Grace 3048 113
410 Poison 3198 96
411 Pretenders 3163 96
412 Primus 3763 83
413 Prince 4676 106
414 Proclaimers 2521 76
415 Procol Harum 2978 86
416 Puff Daddy 5972 61
417 Q-Tip 2487 17
418 Qntal 631 8
419 Quarashi 1187 10
420 Quarterflash 985 23
421 Quasi 1005 25
422 Queen 4664 163
423 Queen Adreena 1366 41
424 Queen Latifah 3553 50
425 Queens Of The Stone Age 2234 68
426 Queensryche 3465 91
427 Quicksand 970 19
428 Quicksilver Messenger Service 811 15
429 Quiet Riot 2181 57
430 Quietdrive 1429 36
431 Quincy Jones 1055 19
432 Quincy Punx 1149 18
433 R. Kelly 6207 145
434 Radiohead 3535 150
435 Raffi 1246 36
436 Rage Against The Machine 3790 55
437 Rainbow 2656 64
438 Rammstein 1706 44
439 Ramones 3630 172
440 Randy Travis 4129 177
441 Rascal Flatts 3266 111
442 Ray Boltz 2759 97
443 Ray Charles 4110 167
444 Reba Mcentire 3709 187
445 Red Hot Chili Peppers 5828 173
446 Regine Velasquez 3007 101
447 Religious Music 2920 83
448 Rem 1828 42
449 Reo Speedwagon 3128 122
450 Richard Marx 3099 121
451 Rick Astley 1544 56
452 Rihanna 4689 143
453 Robbie Williams 5468 166
454 Rod Stewart 5118 178
455 Rolling Stones 4598 179
456 Roxette 3384 138
457 Roxy Music 2461 64
458 Roy Orbison 3402 178
459 Rush 5622 175
460 Sam Smith 804 20
461 Santana 2512 103
462 Savage Garden 1428 29
463 Scorpions 3124 167
464 Selah 1552 48
465 Selena Gomez 1730 44
466 Sia 2262 77
467 Side A 588 11
468 Slayer 4849 119
469 Smiths 2332 67
470 Snoop Dogg 6552 71
471 Soundgarden 2108 72
472 Soundtracks 284 3
473 Spandau Ballet 1683 42
474 Squeeze 4803 146
475 Starship 1436 34
476 Status Quo 3168 162
477 Steely Dan 3203 88
478 Steve Miller Band 2524 109
479 Stevie Ray Vaughan 911 27
480 Stevie Wonder 4099 139
481 Sting 3641 90
482 Stone Roses 1475 37
483 Stone Temple Pilots 2038 62
484 Styx 3732 121
485 Sublime 3110 63
486 Supertramp 2487 74
487 System Of A Down 2481 67
488 Talking Heads 3298 77
489 Taylor Swift 2887 81
490 Tears For Fears 2442 66
491 Ten Years After 1235 47
492 The Beatles 3595 178
493 The Broadways 1192 14
494 The Jam 3263 86
495 The Killers 2969 75
496 The Monkees 3523 148
497 The Script 1706 32
498 The Temptations 3684 117
499 The Weeknd 1365 29
500 The White Stripes 2035 64
501 Thin Lizzy 3378 109
502 Tiffany 1692 49
503 Tim Buckley 2079 58
504 Tim McGraw 4120 148
505 Tina Turner 2913 112
506 Tom Jones 3323 141
507 Tom Lehrer 2369 23
508 Tom T. Hall 4400 160
509 Tom Waits 3296 70
510 Tool 2099 36
511 Tori Amos 3928 110
512 Toto 3190 127
513 Townes Van Zandt 3177 90
514 Tracy Chapman 2530 82
515 Tragically Hip 4804 132
516 Train 2621 81
517 Travis 2236 88
518 Twenty One Pilots 1981 33
519 U-Kiss 82 1
520 U. D. O. 721 12
521 U2 3956 133
522 UB40 4341 140
523 Ufo 3022 95
524 Ugly Kid Joe 1548 36
525 Ultramagnetic Mc's 2285 16
526 Ultravox 2322 61
527 Uncle Kracker 2099 42
528 Uncle Tupelo 1478 40
529 Underoath 1850 46
530 Underworld 1941 41
531 Unearth 1706 39
532 Ungu 96 2
533 Unkle 1244 28
534 Unknown 329 4
535 Unseen 1490 35
536 Unwritten Law 1531 42
537 Uriah Heep 3349 168
538 Used 2310 67
539 Usher 4623 117
540 Utada Hikaru 3064 51
541 Utopia 2856 86
542 Van Der Graaf Generator 780 5
543 Van Halen 3684 102
544 Van Morrison 3987 150
545 Vanessa Williams 1971 68
546 Vangelis 855 18
547 Vanilla Ice 3539 35
548 Various Artists 230 3
549 Velvet Underground 1803 48
550 Vengaboys 598 13
551 Venom 3517 90
552 Vera Lynn 508 12
553 Vertical Horizon 1822 58
554 Veruca Salt 1621 48
555 Verve 1458 51
556 Vince Gill 3064 171
557 Violent Femmes 2547 83
558 Virgin Steele 2706 50
559 Vonda Shepard 1848 61
560 Vybz Kartel 2845 37
561 W.A.S.P. 3248 112
562 Walk The Moon 560 11
563 Wanda Jackson 2703 145
564 Wang Chung 1246 28
565 Warren Zevon 3490 105
566 Waterboys 2730 72
567 Waylon Jennings 3240 152
568 Ween 3168 97
569 Weezer 2845 89
570 Weird Al Yankovic 6422 106
571 Westlife 2936 134
572 Wet Wet Wet 1766 58
573 Wham! 1296 21
574 Whiskeytown 1584 53
575 Whitesnake 2515 103
576 Whitney Houston 2771 93
577 Who 4980 163
578 Widespread Panic 3423 100
579 Will Smith 3473 30
580 Willie Nelson 3253 148
581 Wilson Phillips 1258 32
582 Wilson Pickett 1034 24
583 Wishbone Ash 2910 102
584 Within Temptation 2124 71
585 Wiz Khalifa 3264 40
586 Wu-Tang Clan 7684 53
587 Wyclef Jean 3975 45
588 X 978 18
589 X Japan 869 12
590 X-Raided 1286 7
591 X-Ray Spex 915 22
592 X-Treme 38 1
593 Xandria 1021 25
594 Xavier Naidoo 373 3
595 Xavier Rudd 1270 40
596 Xentrix 568 9
597 Xiu Xiu 1021 25
598 Xscape 1595 37
599 XTC 4996 144
600 Xzibit 5684 47
601 Yazoo 991 23
602 Yeah Yeah Yeahs 1468 50
603 Yelawolf 2080 14
604 Yello 1967 57
605 Yellowcard 2023 72
606 Yeng Constantino 613 8
607 Yes 3854 108
608 YG 1431 12
609 Ying Yang Twins 3334 35
610 Yngwie Malmsteen 2651 106
611 Yo Gotti 2539 22
612 Yo La Tengo 1541 47
613 Yoko Ono 2762 85
614 Yolanda Adams 1858 48
615 Yonder Mountain String Band 646 10
616 You Am I 2224 54
617 Young Buck 2495 17
618 Young Dro 1290 7
619 Young Jeezy 4877 57
620 Youngbloodz 2518 19
621 Youth Of Today 864 22
622 Yukmouth 1612 11
623 Yung Joc 1577 9
624 Yusuf Islam 739 15
625 Z-Ro 5242 54
626 Zac Brown Band 1531 31
627 Zakk Wylde 915 21
628 Zao 1305 30
629 Zayn Malik 780 15
630 Zazie 196 2
631 Zebra 796 20
632 Zebrahead 3337 76
633 Zed 84 1
634 Zero 7 949 24
635 Zeromancer 888 30
636 Ziggy Marley 2462 64
637 Zoe 69 1
638 Zoegirl 1595 38
639 Zornik 460 12
640 Zox 1019 21
641 Zucchero 1343 30
642 Zwan 685 14
643 ZZ Top 3765 132
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
# Plot interactive scatter plotplot_ly(unique_words, x =~num_songs, y =~text_combined, text =~artist, type ='scatter', mode ='markers') %>%layout(title ="Number of Songs Written vs. Number of Unique Words Used",xaxis =list(title ="Number of Songs Written"),yaxis =list(title ="Number of Unique Words Used"))
The code below performs an analysis on a data set containing information about songs, specifically focusing on the vocabulary richness of different artists. Initially, it calculates the weighted average vocabulary size for each artist, taking into account the number of songs they’ve written. Artists with more songs are given more weight in the calculation, providing a more balanced comparison. The code then identifies the top 5 artists with the largest and smallest weighted average vocabulary sizes and plots them side by side using ggplot. These graphs offer insights into the diversity of vocabulary usage among artists. Contrary to the original instance where artists were solely ranked based on the number of unique words they used, this approach adjusts for the disparity in the number of songs written by each artist, ensuring a fairer comparison. Consequently, the graphs showcase artists not only with the largest or smallest vocabularies but also consider their productivity in terms of songwriting. This nuanced analysis provides a deeper understanding of the linguistic diversity exhibited by artists relative to their output.
# Calculate the total number of words written by each artisttotal_words <-aggregate(text_combined ~ artist, data = all_artists_df, FUN =function(x) length(unlist(strsplit(x, "\\s+"))))# Merge total words with vocabulary dataframevocabulary_df <-merge(vocabulary_df, total_words, by ="artist", all.x =TRUE)# Calculate weighted average vocabulary size, handling division by zerovocabulary_df$weighted_avg_vocabulary <-ifelse(vocabulary_df$text_combined ==0, 0, vocabulary_df$vocabulary_size / vocabulary_df$text_combined)# Sort dataframe by weighted average vocabulary sizevocabulary_df <- vocabulary_df[order(vocabulary_df$weighted_avg_vocabulary, decreasing =TRUE), ]# Create a subset dataframe for the 5 artists with the largest adjusted vocabularytop5_largest_adjusted <-head(vocabulary_df, 5)# Create a subset dataframe for the 5 artists with the smallest adjusted vocabularytop5_smallest_adjusted <-tail(vocabulary_df, 5)# Combine the top 5 largest and smallest adjusted dataframescombined_adjusted_df <-rbind(top5_largest_adjusted, top5_smallest_adjusted)# Add a column to indicate whether the artist has the largest or smallest adjusted vocabularycombined_adjusted_df$category <-ifelse(combined_adjusted_df$artist %in% top5_largest_adjusted$artist, "Largest", "Smallest")# Reorder the artists within each categorycombined_adjusted_df$artist <-factor(combined_adjusted_df$artist, levels = combined_adjusted_df$artist[order(combined_adjusted_df$category, combined_adjusted_df$weighted_avg_vocabulary)])# Plot the combined adjusted dataggplot(combined_adjusted_df, aes(x =reorder(artist, weighted_avg_vocabulary), y = weighted_avg_vocabulary)) +geom_bar(stat ="identity", aes(fill = category)) +facet_wrap(~ category, scales ="free", nrow =1) +labs(title ="Top 5 Artists with Largest and Smallest Adjusted Vocabulary",x ="Artist",y ="Weighted Average Vocabulary Size",fill ="Category") +theme(axis.text.x =element_text(angle =45, hjust =1))
Question 3: Are there any trends or patterns in the sentiment of song lyrics over time or across genres?
We can perform sentiment analysis on the lyrics to determine the overall sentiment of songs.
# Perform sentiment analysis# Assuming you have a function or package for sentiment analysis, replace `sentiment_analysis_function` with the actual function# sentiment_scores <- sentiment_analysis_function(all_words)# Plotting sentiment trends over time or genres# You can plot sentiment scores over time or genres as per your dataset structure and analysis requirements# Example:# ggplot(sentiment_scores, aes(x = date, y = sentiment_score, color = genre)) +# geom_line() +# labs(title = "Sentiment Trends in Song Lyrics",# x = "Date",# y = "Sentiment Score")
Question 4: Which artists tend to use the most profanity?
We can gauge profanity in song lyrics by tallying profane instances and normalize them for fair assessment.
# Assuming you have a profanity list and a function for tallying profane instances, replace `profanity_tally_function` with the actual function# profanity_counts <- profanity_tally_function(all_words)# Plotting profanity trends across artists, genres, or time periods# Example:# ggplot(profanity_counts, aes(x = artist, y = profanity_count, fill = genre)) +# geom_bar(stat = "identity") +# labs(title = "Profanity Usage Across Artists and Genres",# x = "Artist",# y = "Profanity Count") +# theme(axis.text.x = element_text(angle = 45, hjust = 1))